Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent
نویسندگان
چکیده
A major challenge in understanding the generalization of deep learning is to explain why (stochastic) gradient descent can exploit the network architecture to find solutions that have good generalization performance when using high capacity models. We find simple but realistic examples showing that this phenomenon exists even when learning linear classifiers — between two linear networks with the same capacity, the one with a convolutional layer can generalize better than the other when the data distribution has some underlying spatial structure. We argue that this difference results from a combination of the convolution architecture, data distribution and gradient descent, all of which are necessary to be included in a meaningful analysis. We provide a general analysis of the generalization performance as a function of data distribution and convolutional filter size, given gradient descent as the optimization algorithm, then interpret the results using concrete examples. Experimental results show that our analysis is able to explain what happens in our introduced examples.
منابع مشابه
Learning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملTraffic Sign Recognition Using Extreme Learning Classifier with Deep Convolutional Features
Traffic sign recognition is an important but challenging task, especially for automated driving and driver assistance. Its accuracy depends on two aspects: feature extractor and classifier. Current popular algorithms mainly use convolutional neural networks (CNN) to execute both feature extraction and classification. Such methods could achieve impressive results but usually on the basis of an e...
متن کاملRegularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
Effective convolutional neural networks are trained on large sets of labeled data. However, creating large labeled datasets is a very costly and time-consuming task. Semi-supervised learning uses unlabeled data to train a model with higher accuracy when there is a limited set of labeled data available. In this paper, we consider the problem of semi-supervised learning with convolutional neural ...
متن کاملOnline learning from nite training sets androbustness to input
We analyse online gradient descent learning from nite training sets at non-innnitesimal learning rates. Exact results are obtained for the time-dependent generalization error of a simple model system: a linear network with a large number of weights N, trained on p = N examples. This allows us to study in detail the eeects of nite training set size on, for example, the optimal choice of learning...
متن کاملLearning Compact Neural Networks with Regularization
We study the impact of regularization for learning neural networks. Our goal is speeding up training, improving generalization performance, and training compact models that are cost efficient. Our results apply to weight-sharing (e.g. convolutional), sparsity (i.e. pruning), and low-rank constraints among others. We first introduce covering dimension of the constraint set and provide a Rademach...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.04420 شماره
صفحات -
تاریخ انتشار 2018